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Abstract. In this paper we present a simple linear-time algorithm constructing a context- 
free grammar of size 0{glog{N/ g)) for the input string, where A'' is the size of the input 
string and g the size of the optimal grammar generating this string. The algorithm works 
for arbitrary size alphabets, but the running time is linear assuming that the alphabet E 
of the input string can be identified with numbers from {1, . . . ,N'^} for some constant c. 
Otherwise, additional cost of 0{n\og |E|) is needed. 
' Algorithms with such approximation guarantees and running time are known, the novelty 

of this paper is a particular simplicity of the algorithm as well as the analysis of the algo- 
' rithm, which uses a general technique of recompression recently introduced by the author. 

04 ' Furthermore, contrary to the previous results, this work does not use the LZ representation 

pH , of the input string in the construction, nor in the analysis. 

1. Introduction 

1.1. Grammar based compression. This paper presents an alternative linear-time ap- 



q 
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proximation algorithm for the construction of the smallest grammar generating a given string 
T. There are three known algorithms with an approximation ratio 0{log{N/g)), where N is 
the input-string length and g is the size of the optimal grammar [15\ [TJ [TB] . The novelty of 
the proposed algorithm is its apparent simplicity (it uses only local replacement of strings) 
and an analysis that uses the recompression technique developed recently by the author. In 
^ , particular, neither the algorithm, nor its analysis relate to the LZ-compression, which was 

CN ■ the case for previously known algorithms. 

In the grammar-based compression text is represented by a context-free grammar (CFG) 
l/^ I generating exactly one string. The idea behind this approach is that a CFG can compactly 

represent the structure of the text, even if this structure is not apparent. Furthermore, the 
. natural hierarchical definition of the context-free grammars make such representation suit- 

ed I able for algorithms, in which case the string operations can be performed on the compressed 

representation, without the need of the explicit decompression [31 El El [HI [U [T] . Lastly, there 
is a close connection between block-based compression methods and the grammar compres- 
^ ' sion: it fairly easy to rewrite the LZW definition as a 0(1) larger CFG, LZ77 can also be 

^ • presented in this way, but this is much less obvious (and introduces a log(A^/^) blow-up in 

size, where i is the size of the LZ77 representation) [T^ IT]. 

While grammar-based compression was introduced with practical purposes in mind and 
the paradigm was used in several implementations [TUl [21 [E] > it also turned out to be very 
useful in more theoretical considerations. Intuitively, in many cases large data have relatively 
simple inductive definition, which results in grammar representation of a small size. On the 
other hand, it was already mentioned that the hierarchical structure of the context free 
grammar allows operations directly on the compressed representation. A recent survey by 
Lohrey[TT] gives a comprehensive description of several areas of theoretical computer science 
in which grammar-based compression was successfully applied. 

The main drawback of the grammar-based compression is that producing the smallest 
CFG for a text is difficult: the decision problem is NP-hard [13 and the size of the grammar 
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cannot be approximated within a constant factor [T]. Furthermore, in an extremely simple 
cases of texts of the form a^^ ba^^ b - ■ ■ ba^'' construction of the grammar is equivalent (up to 
a constant factor) to construction of an addition chain for the sequence ii < £2 < ... < 

ik and no algorithm with an approximation guarantee better than — for this 

loglog(}^.^^^i) 

problem is known [TH], despite heavy research. Thus, it is unlikely that an algorithm with 
an approximation guarantee o( iog°iog jv ) is found. 

1.2. Approximation. The hardness of the problem naturally leads to two directions of re- 
search: on one hand, several heuristics are considered [TUIEIIIS], on the other, approximation 
algorithms, with a guaranteed approximation ratio, are proposed; in this paper we consider 
only the latter. 

The first algorithm with approximation ratio 0{\og{N/g)) is due to Rytter [T2]. His 
algorithm first applied the LZ77 compression to the input string and then transformed the 
obtained LZ77 representation to a grammar, yielding an 0(^log(A^/^)) size grammar, where 
i is the size of the LZ77 representation. It is known that I < g and as /(x) = xlog(A^/x) 
is increasing, the bound 0{log{N/g)) on approximation ratio follows. The crucial part of 
the construction was the requirement that the intermediate constructed grammar defines 
a derivation tree satisfying the AVL condition. The bound on the running time and the 
approximation guarantee were all consequences of the balanced form of the derivation tree 
and of the known algorithms for merging, splitting, etc. of AVL trees. 

Second devised algorithm, given by Charikar et al. [1] independently followed more or less 
the same path, with a different condition imposed on the grammar: it was required that its 
derivation tree is length-balanced, i.e. for a rule X — >■ YZ the lengths of words generated by 
Y and Z are within a certain multiplicative constant factor from each other. For such trees 
efficient implementation of merging, splitting etc. operations were goiven (constructed from 
scratch) by the authors and so the same running time as in the case of the AVL trees was 
obtained. 

Lastly, Sakamoto [TOI proposed a different algorithm, based on RePair [TU], which is one 
of the practically implemented and used algorithms for grammar-based compression. His 
algorithm iteratively replaced pairs of different letters and maximal blocks of letters (a^ is 
a maximal block if that cannot be extended by a to either side). A special pairing of the 
letters was devised, so that it is 'synchronising': if w is sufficiently long, then letters in two 
of its appearances in text are paired and compressed in the same way, except perhaps for 
0(1) letters in the beginning and end. The analysis was based on considering the LZ77 
representation of the text and proving that due to 'synchronisation' the factors of LZ77 are 
compressed very similarly as the text to which they refer. 

However, to the author's best knowledge and understanding, the presented analysis [IB] is 
incomplete, as the cost of nonterminals introduced for the representation of maximal blocks 
is not bounded in the paper, see the appendix; the bound that the author was able to obtain 
using there presented approach is 0{log{N/g)^), so worse than claimed. 

1.3. Proposed approach: recompression. In this paper another algorithm is proposed, 
it is constructed using the general approach of recompression, developed by the author, based 
on iterative application of two replacement schemes performed on the text T: 

pair compression of ab: For two different symbols (i.e. letters or nonterminals) a, b 
such that substring ab appears in T replace each of ab in T by a fresh nonterminal c. 

a's block compression: For each maximal block a^, where a is a letter or a nonter- 
minal and £ > 1, that appears in T, replace all a^s in T by a fresh nonterminal 

In the beginning of a phase, all pairs appearing in the current text are listed in P, similarly, 
L contains all letters appearing in the current text. Then pair compression is applied to an 
appropriately chosen subset of P and all blocks of symbols from L, then the phase ends. In 
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everything works perfectly, each symbol of the T is replaced and so its length drops by half; 
in reality the text length drops by some smaller, but constant, factor per phase. For the sake 
of simplicity, we treat all nonterminals introduced by the algorithm as letters. 

In author's previous work it was shown that such an approach can be efficiently applied 
to text represented in a grammar compressed form. In this way new results for compressed 
membership problem [H], fully compressed pattern matching [B] and word equations [7] were 
obtained. In this paper a somehow opposite direction is followed: the recompression method 
is employed to the input string. This yields a simple linear-time algorithm: Performing one 
phase in OdTl) running time is relatively easy, since the length of T drops by a constant 
factor in each phase, the 0{N) running time is obtained. 

However, the more interesting is the analysis, and not the algorithm itself: it is performed 
by applying the recompression to the optimal grammar G for the input text. In this way, 
the current G always generates the current string kept by the algorithm and the number of 
nonterminals introduced during the construction can be calculated in terms of |G| = g. 

We show that a relatively straightforward analysis yields that the generated grammar 
is of size 0{g\ogN), a slightly more involved algorithm that combines the recompression 
technique with a naive approach that generates a grammar of size 0{N) yields a grammar 
of size 0{g\og{N/g)+g). 

1.4. Advantages of the proposed technique. We believe that the proposed algorithm 
is interesting, as it is very simple and its analysis for the first time does not rely on LZ77 
representation of the string. Potentially this can help in both design of an algorithm with a 
better approximation ratio and in showing a logarithmic lower bound: Observe that LZ77 
representation is known to be at most as large as the corresponding grammar, so it might 
be that some algorithm produces a grammar of size o{g\og{N / g))^ even though this is of 
size il(^log(A^/^)), where I is the size of the LZ77 representation of the string. Secondly, as 
the analysis 'considers' the optimal grammar, it may be much easier to observe, where any 
approximation algorithm performs badly, and so try to approach a logarithmic lower bound. 
This is much harder to imagine, when the approximation analysis is done in terms of the 
LZ77. 

Comparison with Sakamoto's algorithm. The general approach is similar to Sakamoto's 
method, however, the pairing of letters seems more natural. Also, the construction of non- 
terminals for blocks of letters is different, the author failed to show that the bound actually 
holds for the variant proposed by Sakamoto. It should be noted that the analysis presented 
in this paper the calculation of nonterminals used due to pair compression is fairly easy, 
while estimating the number used for block compression is non-obvious. Also, the connec- 
tion to the addition chains suggests that the compression of blocks is the difficult part of the 
smallest grammar problem. 

Note on computational model. The presented algorithm runs in linear time, assuming that 
the S can be identified with a continues subset of natural numbers of size 0{N'^) for some 
constant c and the RadixSort can be performed on it. Should this not be the case for 
the input, we can replace the original letters with such a subset, in ©(nloglSl) time (by 
creating a balanced tree for letters appearing in the input string). Note that the same 
applies to previous algorithms: when an LZ77 representation is created using a suffix-tree 
the linear-time construction for it assumes that the alphabet consists of integers and that 
these can be sorted in linear time [2]. Although Sakamoto's method was designed to work 
with constant-size S, it can be easily extended to the case when S can be identified with a 
sequence of 0{N'^) numbers, retaining the linear-time. 

2. The algorithm 

The input sequence to be represented by a context-free grammar is T G S* and N denotes 
its initial length. The algorithm introduces new symbols to the instance, which are the 
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nonterminals of the constructed grammar. However, these are later treated exactly as the 
original letters, so we insist on calling them letters as well and use common set S for both 
letters and nonterminals. We assume that T is represented as a doubly-linked list, so that 
removal and replacement of its elements can be performed in constant time (assuming that 
we have a link to such an appearance). 

The smallest grammar generating T is denoted by G and its size, measured as the length of 
the productions, is g. The crucial part of the analysis is the modification of G according to the 
compression performed on T. The terms nonterminal, rules, etc. always regard to the optimal 
grammar G (or its transformed version). Still, we need to estimate the number of productions 
in the constructed grammar, to avoid confusion, we say about cost of representing a letter 
a, i.e. the number of rules in the constructed grammar needed for the new 'letter' a in terms 
of letters already present in the instance. Note that when a replaces a block of letters, the 
cost might be larger than constant. 

Algorithm 1 TtoG: outline 

1: vi^hile |rj > 1 do 

2: L -tr- list of letters in T 

3: for each a L do 

4: compress maximal blocks of a 

5: P -It- list of pairs 

6: find partition of S into and 
T 

7: 

8: for ab e Pn T,eT.r do 

9: compress pair ab 

10: return the constructed grammar 



> Blocks compression 
>0{\T\) 

> Try to maximize the occurrences from S^S,. in 

> (!?(|T|), see Lemma E] 
> These pairs do not overlap 
l> Pair compression 



Lemma 1. Without loss of generality, at the beginning of a phase, the letters used in T form 
an interval of numbers. 

Proof. Observe that we assumed that the input alphabet consists of letters that can be 
identified with subset of {1, . . . , A^'^}, see the discussion in the introduction. Treating them 
as vectors of length c over {0, ... — 1} we can sort them using RadixSort in 0(cN) time, 
i.e. linear one. Then we can re-number those letters to 1, 2, . . . , n for some n < N. 

Suppose that at the beginning of the phase the letters formed an interval [m . . m+k]. Each 
new letter, introduced in place of a compressed pair or block, is assigned a consecutive value, 
and so after the phase the letters appearing in the T are within an interval [m . .m + k'] for 
some k' > k. It is now left to re-number the letters [m . . m + k], so that only those appearing 
in T have valid numbers: we go through T and for each letter a with number in [m . .m + k] 
we increase the count[a] by 1. Then we go through count and assign consecutive numbers, 
starting from m + k' + 1 to letters with non-zero count. Lastly, we replace the values of those 
letters in T by the new values. □ 

2.1. Blocks compression. The blocks compression is very simple to implement: We read 
T, for a block of as of length greater than 1 we create a record {a,£,p), where n is a length 
of the block, and p is the pointer to the first letter in this block. We then sort these records 
lexicographically using RadixSort (ignoring the last component). There are only C(|T|) 
records and we assume that S can be identified with an interval, see Lemma [H this is all 
done in 0(|r|). Now, for a fixed letter o, the consecutive tuples with the first coordinate a 
correspond to all blocks of a, ordered by the size. It is easy to replace them in C(|T|) time 
with new letters. 

Note that the grammar still needs to represent the ai replacing a^: suppose we are to 
replace the blocks a^^, a^^, a^*, where 1 < ii < £2 < ■ ■ ■ < ^k- (For simplicity of 
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presentation, let Iq = 0.) Such a list can be easily obtained form the sorted list of appearances 
of blocks. Then we first represent each block of length ^j+i —ii using the the binary expansion: 
for each 2-^, where 1 < j < log{ik — ik-i) let — )■ a2j-ia2j-i and ai — >■ a. Then each a^^_£._-^ 
can be represented as a concatenation of some of ai , a2 , . . . , , based on the binary notation 
of ii — ii-i. Next, a£._^j is represented as ai-^-^ — )■ a£-^-^^-£^ag^. The representation cost is 
0(fc + max*^+i log(£i-£i_i)). 

Since no two maximal blocks of the same letter can be next to each other, after the block 
compression there are no blocks of length greater than 1 in T. 

Lemma 2. In line\^ there are no two consecutive letters aa in T. 

Proof. Suppose for the sake of contradiction that there are such two letters. There are two 
cases: 

a was present in T in line [2} But then a was listed in L in line[2]and aa was replaced 
by another letter in line|H 

a was introduced in line |4} Both a replaced some maximal blocks thus aa re- 
placed 6^^, and so each of those two was not a maximal block. □ 

2.2. Pair compression. The pair compression is performed similarly as the block com- 
pression. However, since the pairs can overlap, compressing all pairs at the same time is 
not possible. Still, we can find a subset of non-overlapping pairs in T such that a constant 
fraction of T is covered by appearances of these pairs. This subset is defined by a partition 
of S into Til and and choosing the pairs with the first letter in and the second in S,.. 

Lemma 3. For T in 0{\T\) time we can find in line\^ a partition of S into S^, S^. such 
that number of appearances of pairs ah G S^S^ in T is at least (|T| — l)/4 (or 1, if this less 
than 1 ). 

In the same running time we can provide, for each ab £ P Ci S^Sr; « iists of pointers to 
appearances of ab in T. 

Proof. For a choice of S^S^ we say that appearances of ab £ P Ci S^S^ are covered by S^S^. 

If \T\ < 5 then we can easily choose a partition such that at least one pair is covered. In 
the latter we consider only longer T. 

The existence of partition covering at least one fourth of the appearances can be shown 
by a simple probabilistic argument: divide S into and randomly, where each letter 
goes to each of the parts with probability 1/2. Consider two consecutive letters ab in T, note 
that they are different by Lemma [2j Then a G and 6 G with probability 1/4. There 
are \T\ — 1 such pairs in T, so the expected number of pairs in T from S^Sr is (|T| — l)/4. 
Observe, that if we were to count the number of pairs that are covered either by S^S^ or by 
S^S^ then the expected number of pairs covered by S^S^ U S^E^ is (|T| — l)/2. 

The deterministic construction of such a partition follows by a simple derandomisation, 
using an expected value approach. It is easier to first find a partition such that at least 
(|T| — l)/2 pairs' appearances in T are covered by S^S^ U we then choose S^Sr or 

depending on which of them covers more appearances. 

Suppose that we have already assigned some letters to and S,. and we are to decide, 
where the next letter a is assigned. If it is assigned to S^, then all appearances of pairs 
from aTig U S^a are not going to be covered, while appearances of pairs from aS^ U S^a are; 
similarly observation holds for a being assigned to T^. The algorithm makes a greedy choice, 
maximising the number of covered pairs in each step. As there are only two options, the 
choice brings in at least half of appearances considered. Lastly, as each appearance of a pair 
be from T is considered exactly once (i.e. when the second of b, c is considered in the main 
loop), this procedure guarantees that at least half of appearances of pairs in T is covered. 

In order to make the selection effective, the algorithm Greedy Pairs keeps an up to date 
counters counte[a] and countr[a], denoting, respectively, the number of appearances of pairs 
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from aT,£ U S^a and aS^ U S^a in T. Those counters are updated as soon as a letter is 
assigned to T,£ or S^- 



Algorithm 2 Greedy Pairs 

1: L -tr- set of letters used in P 

2: ^ t> Organised as a bit vector 

3: for a € L do 

4: counti[a] ^ coiintr[a] ^0 i> Initialisation 

5: for a G L do 

6: if countr[a] > counti[a\ then > Choose the one that guarantees larger cover 

7: choice ^ i 

8: else 

9: choice ^ r 

10: 5j(;/jQj£e i ^choice U {o-} 

11: for each ab or 6a appearance in T do 

12: COUntchoice[b] ^ COUTlt choice[b] + 1 

13: if # appearances of pairs from S^S^ in T> ^ appearances of pairs from S^S^ in T then 
14: switch and 
15: return (S^, S^) 



By the argument given above, when S is partitioned into and S^, at least half of the 
appearances of pairs from T are covered by S^S^ U . Then one of the choices S^S^, or 
covers at least one fourth of the appearances. 

It is left to give an efficient variant of Greedy Pairs, the non-obvious operations are the 
choice of the actual partition in line [TJ] and the updating of count (\^] or countr\b] in linell2[ 
All other operation clearly take at most 0(|T|) time. The latter is simple: since and 
as organised as a bit vector, we can read T and calculate the number of pairs from S^Sr and 
those from S^S^. 

To implement the count, for each letter a in T we have a right list right{a) = {b\ab appears in 
represented as a list. Furthermore, the element b on right list stores a list of all appearances 
of the pair ab in T. There is a similar left list left{a) = {b\ba appears in T}. We comment, 
how to create left lists and right lists later. 

Given right and left, performing the update in line 1121 is easy (suppose that we are update 
county): we go through right{a) (Jeft(a)) and increase the count(\b] for appearance of ab {ba, 
respectively). Note that in this way each of the list right{a) {left{a)) is read 0{1) times 
during Greedy Pairs, and so this time can be charged to their creation. 

It remains to show how to initially create right{a) {left{a) is created similarly). We read 
T, when reading a pair ab we create a record (a, b, p), where p is a pointer to this appearance. 
We then sort these record lexicographically using RadixSort. There are only OdTj) records 
and we assume that S can be identified with an interval, see Lemma [TJ this is all done in 
0(|r|). Now, for a fixed letters a, the consecutive tuples with the first coordinate a can be 
turned into right{a): for b G right{a) we want to store a list I of pointers to appearances of 
ab, and on a sorted list of tuples the {{a,b,p)}pQi are consecutive elements. 

Lastly, in order to get for each ab & P Ci SfS^, the lists of pointers to appearances of ab 
in T it is enough to read right and filter the pairs such that a S and 6 G S^; the filtering 
can be done in 0{1) as T,£ and are represented as bitvectors. The needed time is 0(|T|). 

The total running time is also OdTQ, as each subprocedure has time constant per pair 
processed or 0(|r|) in total. □ 

When for each pair ab G S^S^. the list of its appearances in T is provided, the replacement 
of pairs is done going through and the list and replacing each of the pair, which is done in 
linear time. Note, that as T,i, are disjoint, the considered pairs cannot overlap. 
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2.3. Size and running time. It remains to estimate the total running time, summed over 
all phases. Clearly each subrocedure in a phase has a running time 0(|T|) so it is enough to 
show that |T| is reduced by a constant factor per phase. 

Lemma 4. In each phase \T\ is reduced by a constant factor. 

Proof. Let m = \T\ at the beginning of the phase. Let m' < m he the length of T after the 
compression of blocks. By Lemma [3] the at least (m' — l)/4 pairs are compressed during the 
pair compression, hence after this phase \T'\ <m' — {m' — l)/4 < |m -\-\. □ 

Theorem 1. TtoG runs in linear time. 

Proof. Each phase clearly takes 0(|T|) time and by Lemma H] the \T\ drops by a constant 
factor in each phase. As the initial length of T is A^, the total running time is 0{N). □ 

3. Size of the grammar: SLPs and recompression 

To bound cost of representing the letters introduced during the construction of the gram- 
mar, we start with the smallest grammar G generating (the input) T and then modify it so 
that it generates T at the beginning of each phase. Then the cost of representing the let- 
ters introduced is paid by various credits assigned to G. Note that this is entirely a mental 
experiment for the purpose of the analysis, as G is not stored or even known to the algorithm. 

We assume that grammar G is a Straight Line Programme, however, we relax the notion a 
bit: i.e. its nonterminals are numbered Xi, . . . , and each rule is of the form Aj — )■ w £ E* 
or Xi uXjV or Xi uXjvXkW, where u,v,w € S* and j, k < i. Note that every CFG 
generating a unique string can be transformed into such a form, with the size increased only 
by a constant factor. We call the letters (strings) appearing in the productions the explicit 
letters (strings, respectively). The unique string derived by Xi is denoted by val(Xj); the 
grammar G shall satisfy the condition val{Xm) = T. We do not assume that val(Xj) ^ e, 
however, if val(Xj) = e then Xi is not used in the productions of G. 

With each explicit letter we associate a unit credit and pay most of the cost of representing 
the letters introduced during TtoG with these credits. The total cost is the sum of credit 
that was issued during the modifications of G plus some value which we estimate separately 
later on. 

Recall that whenever we say nonterminal, rule, production etc., we mean one of G. 

3.1. Pair compression. A pair of letters ab has a crossing appearance in a nonterminal 
Xi (with a rule Xi ai) if ab is in val(Xj) but this appearance does not come from an 
explicit appearance of ab in Oj nor it is generated by any of the nonterminals in . A pair is 
non-crossing if it has no crossing appearance. Unless explicitly written, we use this notion 
only to pairs of different letters. 

By PG ab^c{w) we denote the text obtained from w by replacing each ab by a letter c 
(we assume that a ^b). We say that a procedure properly implements the pair compression 
of ab to c, if Yal{X'^) = PC ab^c{^sX{Xm)) and the grammar G is of the desired form; in 
particular, the credits are assigned to each explicit letter. When a pair ab is noncrossing 
implementing the pair compression is easy, as it is enough to replace each explicit ab with c. 



Algorithm 3 PairCompNCr(a6, c): compressing a non-crossing pair ab. 
1: replace each explicit ab in G by c 



In order to distinguish between the nonterminals, grammar, etc. before and after the 
application of compression of ab (or, in general, any procedure) we use 'primed' letters, i.e. 
X[, G' , T' for the nonterminals, grammar and text after this compression and 'unprimed', 
i.e. Xi, G, T for the ones before. 
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Lemma 5. If ab is a noncrossing pair, then PairCompNCr(a6, c) properly implements the 
compression of ab. The credit and cost of representing the new letter c is paid by the released 
credit. If a pair de appearing in T was noncrossing in G, it is in G' . 

Proof. By induction on i we show that val(Xj') = PG ab~>c{^sl[Xi)) . Consider any appear- 
ance of ab in the string generated by Xi. If it is an exphcit string then it is replaced 
by PairCompNCr(a6, c). If it is contained within substring generated by some Xj, this ap- 
pearance was compressed by the inductive assumption. The remaining case is the crossing 
appearance of ab, but this is ehminated by the assumption that ab is non-crossing. 

Each appearance of ab had two units of credit while c has only 1, so the replacement 
released two units of credit, one of which is used to pay for the credit of c and the other 
to pay the cost of representation of c (if we replace more than one pair ab, some credit is 
wasted) . 

Lastly, replacing ab in G by a new letter c cannot make de a crossing pair. □ 

It is left to assure that the pairs from S^S^ are all noncrossing. Intuitively, there are three 
types of 'bad' situations: 

• there is a nonterminal Xi such that val(Xj) begins with b and aXi appears in one of 
the rules; 

• there is a nonterminal Xi such that val(Xj) ends with a and Xib appears in one of 
the rules; 

• there are nonterminals Xi, Xj such that val(Xj) ends with a and val{Xj) begins with 
b and XiXj appears in one of the rules. 

Note that in each of these cases i < m, and j < m in the last case as well. Consider the first 
case, let bw = val(Xj). Then it is enough to modify the rule for Xi so that val(Xj) = w and 
replace each Xi in the rules by bXi, we call this action the left-popping b from Xi. Similar 
operations can be done for other cases. Such operations can be performed for many letters in 
parallel: The below procedure Pop{Tii,T,r) 'uncrosses' all pairs from the set S^S^, assuming 
that T,£ and are disjoint subsets of S. 



Algorithm 4 Pop(S£, S^): Popping letters from and S,. 

1: for i 1 . . m — 1 do 

2: let the production for Xi he Xi ^ a 

3: if the first symbol of is 6 € Sr then 

4: remove this b from Oj 

5: replace Xi in G's productions by bXi 

6: if val(Xj) = e then 

7: remove Xi from G's productions 

8: for i ^ 1 . . m — 1 do 

9: let the production of Xi he Xi ^ a 

10: if the last symbol of is o € T,£ then 

11: remove this a from ai 

12: replace Xi in G's productions by Xia 

13: if val{Xi) = e then 

14: remove Xi from G's productions 



Lemma 6. After application o/ Pop(S£, S,.), where S^nS^ = 0, none of the pairs ab G S^S^ 
is crossing. Furthermore, val(X^) = val(Xm)- The credit of G increases by at most 0{m). 

Proof. Consider only the first half of Pop (left-popping), which deals with the leading letters. 
We show by induction on i, the number of the processed nonterminal, that va\{Xj) = val(Xj) 
if j > i or the first letter of val{Xj) is not in S^,; otherwise val{Xj) = 6val(Xj) for some 
letter 6 G S,.. 



> Left-popping b 



> Right-popping a 
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For the induction basis consider i = 1, let the rule for Xi be Xi — )■ bu, where b £ T, and 
u G S*. 

b € S^: then this b is removed from the rule and each appearance of Xi in the rules' 
bodies is replaced with bX[. Hence, val(Xi) = bval{X[) and for each other nonter- 
minal val{Xj) = val(Xj). 

b ^Y^r- then nothing is changed and so val{Xj) = val(Xj) for all j. 

So consider an inductive step, let Pop consider the nonterminal ^i+i- We distinguish 
three copies of nonterminals now: the original one (so the one obtained after the 

processing of Xi but before Xi+i (those are denoted with primes, i.e. Xl_^_i) and the ones 
that are obtained after considering Xj+i (which are denoted with double prime, i.e. Xl'_^^). 
By the inductive assumption val(Xj+i) = val(X-_,_]^) = bu for some 6 € S and n G S*. If 
b ^ Tir then Pop performs no action and we are done. If 6 G then the first symbol in 
the rule for X'^^i^ is b: the only other option is that it begins with some nonterminal for 
k < i. Then X^ did not left-pop a letter and so val(X^,) = val(Xfc) and it begins with 6 S S,.. 
Contradiction, as Xk should have left-popped a letter b. 

So the first letter in the rule for Xl_^i is 6, and Pop removes this b from the rule and 
replaces each X'^^^ in the rules by bX'^j^^. Thus val(Xj_|_i) = val(X|^-^) = bY&\{X^_^^). And in 
the rules, each was replaced with which evaluate to the same string, so values 

of other nonterminals have not changed. 

Suppose that after Pop there is a crossing pair ab € S^S,.. There are three already 
mentioned cases: consider only one of them, that aXi appears in the rule and val(Xj) begins 
with b. Note that as a ^ is the letter to the left of Xi, Xi did not left-pop a letter. But 
it begins with b € S^, so it should have. Contradiction. The other cases are dealt with in a 
similar manner. 

Note that at most 4 new letters are introduced to each rule, thus the credit increases by 
at most 4m. □ 

In order to compress pairs from SfS^ it is enough to first uncross them all and then 
compress them. 



Algorithm 5 PairComp(S£, S^.): compresses pairs from S^S,. 


1: run Pop(S^, Sr) 




2: for ab € S^Sr do 




3: run PairCompNCr(a&, c) 


c is a fresh letter 



Lemma 7. PairComp implements pair compression for each ab € S^S^,. It introduces 0{m) 
new credit to G. 

Corollary 1. The compression of pairs introduce in total 0{m\ogM) credit. 

3.2. Blocks compression. Similar notions and analysis are applied for blocks. Consider 
appearances of maximal a-blocks in T and their derivation by G. Then a block a^ has 
a crossing appearance in Xi with a rule Xi — >■ Oj, if it is contained in val(Xj) but this 
appearance is not generated by the explicit as in the rule nor in the substrings generated by 
the nonterminals in Oj. If as blocks have no crossing appearances, then a has no crossing 
blocks. As for noncrossing pairs, the compression of a blocks, when it has no crossing blocks, 
is easy. 



Algorithm 6 BlockCompNCr(a), which compresses a blocks when a has no crossing blocks 
1: for each a^" do 

2: replace every explicit maximal block a^™ in G by a^^ 
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Lemma 8. Suppose that a has no crossing blocks. Then BlockCompNCr(a) properly com- 
presses a 's blocks. 

Furthermore, if a letter b from T had no crossing blocks in G, it does not have them in G. 
The proof is similar to the proof of Lemma and so it is omitted. 

It is left to ensure that no letter has a crossing block. The solution is similar to Pop, this 
time though we need to remove the whole prefix and suffix from val(Xj) instead of a single 
letter. The idea is as follows: suppose that a has a crossing bloc because aXi appears in 
the rule and val(Xj) begins with a. Left-popping a does not solve the problem, as it might 
be that val(Xj) still begins with a. Thus, we keep on left-popping until the first letter of 
val(Xj) is not a, i.e. we remove the a-prefix of val(Xj). The same works for suffixes. 



Algorithm 7 RemCrBlocks: removing crossing blocks. 
1: for i ^ 1 . .m — 1 do 

2: let a, b be the first and last letter of eval{Xi) 

3: let li, ri be the length of the a-prefix a^' and 6-suffix of val(Xj) 

4: If val(Xj) € a* then rj = and 

5: remove a^' from the beginning and from the end of the rule 

6: replace Xi by a^^Xib^"- in the rules 

7: if val(Xj) = e then 

8: remove Xi from the rules 



Lemma 9. After RemCrBlocks no letter has a crossing block and val(X^) = val(X^). 

Proof. Observe that if aXi appears in any rule, then a is not the first letter of val(Xj), as 
prefix of letters a was removed from Xi . Other cases are handled similarly. So there are not 
crossing blocks after RemCrBlocks. Clearly, val(X^) = val{Xm). □ 

So the compression of all blocks of letters is done by first running RemCrBlocks and then 
compressing each of the block. Note that we do not compress blocks of letters that are 
introduced in this way. Concerning the number of credit, the arbitrary long blocks popped 
by RemCrBlocks are compressed (each into a single letter) and so only 4 credit per rule is 
introduced. 



Algorithm 8 BlockComp: compresses blocks of letters 

1: run RemCrBlocks 

2: L -It- list of letters in T 

3: for each a G L do 

4: run BlockCompNCr 



Lemma 10. BlockComp properly compresses blocks of each letter a appearing in T before 
its application and introduces at most 0{m) credit. 

3.3. Calculating the cost of representing letters in block compression. While the 
credits were enough to pay the cost of representing letters introducing during the pair con- 
struction, this is not the case for block compression. The appropriate analysis is presented 
in this section. The overall plan is as follows: firstly, we define a scheme of representing 
the letters based on the grammar G and the way G is changed by BlockComp. For such 
a representation schema, we show that the cost of representation is 0{g\ogN). Lastly, it 
is proved that the actual cost of representing the letters by TtoG is smaller than the one 
defined based on G, hence it is also 0{g\ogN). 



val(X,)| 
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3.3.1. Representing blocks of letters using G. The intuition is as follows: while the a blocks 
can have exponential length, most of them do not differ much, as in most cases the new blocks 
are obtained by concatenating letters a that appear explicitly in the grammar and in such 
a case the credit can be used to pay for the creation of the new rule. This does not apply 
when the new block is obtained by concatenating two different blocks of a (popped from 
nonterminals) inside a rule. However, this cannot happen too often: when blocks of length 
pi, P2, ■ ■ ■ , Pi are compressed (at the cost of J2i=ilogpi), the length of the corresponding 
text in the input text is ni=iPi> which is at most A^. Thus Yli=i log Pi = log 11^=1 Pi ^ log A^. 

We create a new symbol for each a block that is either popped from a nonterminal or is 
in a rule at the end of the BlockComp. Such a block is a power if 

• Xi is removed from the rules (since val(X') = e) 

• it is obtained by concatenation of two different powers of inside a rule and perhaps 
some other explicit letters a 

The second case appears only when in the rule Xi — )• uXjvX^w the popped suffix of Xj and 
popped prefix of are blocks of the same letter, say a, and furthermore f G a*. Note that 
it might be that one (or both) of Xj and X^ were removed in the process and that this power 
was popped by RemCrBlocks. For each block that is not a power we may uniquely identify 
another block (which may be a power or e) such that was obtained by concatenating 
explicit letters to a^. 

We represent the blocks as follows: 

• for a block that is a power we express using the binary expansion, which costs 
0(l + log^); 

• for a block that is obtained by concatenating i — k explicit letters to we express 
at as OfcO . . . a, the cost 0{£ — k) is covered by the credit released by the explicit 
letters. 

3.3.2. Cost of G -based representation. We now estimate the cost of representing the letters 
introduced during the block compression described in the previous section. The idea is that 
each nonterminal Xi can represent a block of length at most log and so it should be enough 
to spend that amount of cost on the representation. 

Lemma 11. The total cost of representing powers is 0{g\ogN). 

Proof. First notice that if a power is created because val(Xj) S a*, then this happens at 
most twice for this Xi'. when Xi is removed, then before the popping val(Xj) = a^b^ for 
some r > 0. Furthermore, | val(Xj)| < N and so the cost of representing this power is at 
most 0(log A^). Summing over all nonterminals yields 0(mlog A^). 

For the powers that are obtained as concatenation of two other powers, there are two 
cases: first, after the creation of the power in a rule Xi — )> uXjvXkW one of the nonterminals 
Xj, Xfc is removed. But this happens at most once for the rule and the cost of 0{\ogN) of 
representing the power can be charged to a rule. Summing over all rules yields 0{mlogN). 

The last and crucial case is when after the creation of power both nonterminals remained 
in a rule. Fix such a rule Xi — ?> uXjvXf^w. Note that creation of the a power here means 
that both the a-suffix of val(Xj) and the a-prefix of val(Xfc) are represented using the powers 
of a, moreover, v € a*. 

Fix this rule and consider all such creation of powers performed on this rule. Let the 
consecutive letters, whose blocks are compressed, be and their lengths 

pi, p2, . . . , pe- Lastly, the pi repetitions of a^^^ are replaced by a^^~^^\ Observe, that a^^~^^^ 
does not need to be the letter that replaced the a^^^'s block, as there might have been some 
other compression performed on that letter. Then the cost of the representation is at most 
constant time more than 

I e 
(1) 5^1 + logpi <2^1ogpi . 

1=1 i=l 
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Define weight: for a letter it is the length of the substring of the original input string that 
it derives. Consider the weight of the strings between Xj and X^- Clearly, after the i-th 
blocks compression it is exactly pi ■ w(a(*-*), as the block of letters a^*-* was replaced by one 
letter. However, we claim that w{a^^~^^^) > w(a^*^): right after the i-th blocks compression 
the string between Xj and Xj^ is simply a letter ap] , which replaced the pi block of a^*^. 
After some operations, this string consists of Pi+i letters a^*"*"^^ . Since the operations in the 
whole TtoG do not remove the letters from string between Xj and X^ in a rule, it holds that 

w(o(^+^)) > w(a«) =piw(a»). 

Since w(a(^)) > 1 it follows that w(a(^+^)) > lli=iPi- As w(a(^+^)) < it can be concluded 
that 

e 

\og{N) > ^ log Pi. 

i=l 

Therefore, the whole cost ^i=ilogpi, as estimated in ([T]), is 0{\ogN). Summing over all 
nonterminals yields the bound O(mlogA^), as claimed. □ 

3.3.3. Comparing the cost of representation induced by G and the one of TtoG. We now show 
that the cost of representing letters by actual strategy used by TtoG yields a cost at most as 
much as the one used by the strategy defined for G (note that the latter cost includes the 
credit released by explicit letters). Imagine all blocks represented by grammar-based schema 
as a directed weighted graph, the weights of edges correspond to the cost of representing a 
letter: 

• when is a power, the node labelled with a£ has an edge to e with weight 1 + log£ 
(recall that this is the cost of representing this power); 

• when a£ is represented as a concatenation of I — k letters to ai^, the node a£ has an 
edge to Ofc of weight i — k (this is the cost of representing this block, note that it was 
paid by the credit on the i — k explicit letters a). 

Then the sum of the weight of such a graph is a cost of representing the blocks using the 
grammar schema (up to a constant factor). We shall transform this graph into a similar 
one that corresponds to the actual representation cost of the TtoG. The transformation only 
decreases the overall weight, thus this will end the proof. 

Firstly, let us sort the nodes according to the increasing length of the powers (if there 
are duplicates, we can remove any of the copies, redirecting its incoming edges to the other 
copy). For each node ae, we redirect its edge to its direct predecessor Ok and label it with a 
cost 1 -|- log(£ — k). This cannot increase the cost: 

• if ai represents a power, then it has an edge labelled with 1 + log^ > 1 + log(^ — k); 

• otherwise it had an edge to some k' < k with a label i — k'. Then 1 + log(£ — k) < 
i — k<i — k',as claimed (note that 1 + log x < x for x > 1). 

Secondly, observe that all blocks represented in TtoG appear in T and so they were also 
represented by the G-based representation. On the other hand, some of the blocks repre- 
sented by G perhaps were not represented by TtoG. For such a block we remove its node 
ag and redirect its unique incoming edge to its predecessor, say a^/, changing the weight 
appropriately. Since l-|-log(x)-|-l + log(?/) > l-|-log(a:;-|-y) when rE,y > 1, we do not increase 
the total weight. 

In the end, we obtain a graph corresponding to the representation cost of TtoG, so the total 
cost of representing letters introduced during the block compression is at most 0(mlog A^). 

4. Improved analysis 

The naive algorithm, which simply represents the word as Xi — > w results in a grammar 
of size A^. In some extreme cases this might be better than 0{glogN), guaranteed by 
TtoG. We merge the naive approach with the recompression-based algorithm: if at the 
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beginning of a phase i TtoG already paid k for representation of the letters and the remaining 
text has size |T| then a grammar for the input string of the total size A; + |T| + 1 can be 
easily given and we choose the minimum over all possible i. We call the corresponding 
algorithm TtoG Imp. Additionally, we show that when |T| ~ (7 then the cost of representing 
letters is 0{g\og{N / g)) and so corresponding grammar considered by TtoGImp is of size 
0{g + g\og{N / g)) ^ consequently, the grammar returned by TtoGImp is also of this size. This 
matches the best known results for the smallest grammar problem \lb\ fH ITU]. 



Algorithm 9 TtoGImp: improved version outline 
1: « ^ 

2: vi^hile |T| > 1 do 

3: size[i] 1^1+ so-far cost of representing letters > Cost of the grammar from phase i 

4: i i + 1 > Number of the phase 

5: L -h- list of letters in T > The compression is done as in TtoG 

6: for each a L do 

7: compress maximal blocks of a 

8: P list of pairs 

9: find partition of S into and 
10: for ab e Pn S^S^ do 
11: compress pair ab 

12: output grammar Gj, for which size[i] is smallest 



Lemma 12. // at the beginning of the phase \T\ > g then so far the cost of representing 
letters by TtoGImp is 0{g + g\og{N/g)). 

Proof. We estimate separately the amount of introduced credit and the cost of representation 
of blocks. This covers the whole cost of representing letters. 

Credit. Observe first that the initial grammar had at most g credit. The input text was 
of length and the current one is of t = \T\ and so there were 0{\og{N /t)) phases, as 
in each phase the length of T drops by a constant, see Lemma SI As t > (7, we obtain 
a bound 0{\og{N / g)) on the number of phases. Due to Lemmata [71 fTUl at most 0{m) 
credit per phase is introduced during the pair compression and block compression, so in 
total 0{g + glog{N/g)) credit was introduced. 

Powers. The representation of blocks and pairs used by TtoGImp is the same as the one of 
TtoG, so Lemma [TT] applies as well and consequently the cost of the representation of blocks 
used by TtoGImp is not larger than the representation that is based on G. So we bound the 
size of the latter. 

The outline of the analysis is as follows: when new power is represented, we mark some 
i letters of the input text. The marked blocks shall be disjoint and there shall be at most 
4m of them. Thus, the total cost will be estimated as 



k k 

(2a) k + ^ log Pi, where k < 4m and < A^ • 

i=l 1=1 

It is easy to show that (|2ap is maximised for k = 4m and each pi equal to N/4m: clearly, the 
sum is maximised for J2i=iPi = N. Then for a fixed k and J2i=iPi = N the sum X]f=i log Pi 
is maximised when all pi are equal, which follows from the fact that log(a;) is concave, hence 
Pi = Lastly, the k + A:log(A^/A;) has a non-negative derivative (for k) and so increases 
with k. Since k < 4m, this is maximised for k = 4m. In this way the value of ()2a|) is at most 

(2b) 4m + 4m log (^^^ = O (^g + glog (^j^ ^ , 
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as m < g. 

It is left to show how these markings are defined. Observe, that since we require that 
there are at most 4m of them, sometimes we will need to join markings. Thus, there is 
no one-to-one correspondence between markings and introduced powers, however, when the 
substrings of length pi, P2, ■ ■ ■ , Pk are marked, it means that the cost of representing the 
powers is ©(EiLi logpi). 

We assign at most 3 markings to each nonterminal Xi of G: 

Xj-power marking: this means that Xi was removed (i.e. val(Xj) = a^b"^) and this 
marking was introduced to pay for the cost of creating powers of a and b; 

Xj-pre-power marking: this means that one of the nonterminals in Xi rule was re- 
moved and a new power was obtained by joining one of the prefix or suffix popped 
from Xi with a block popped from the other nonterminal in this rule, this marking 
pays the cost of representing the new power; 

Xi-in marking: when it was created, Xi had two nonterminals inside its rule and this 
marking pays for (perhaps many) powers created inside the rule for Xi (and perhaps 
also some other powers whose markings were removed in the process). 

Note that by definition there ar eat most two Xj-power marking and one Xj-pre-power 
marking, however, we need to make sure that there is at most one Xi-in marking. 

Whenever a power a^, where £ > 1, is represented, we find the right-most ai in T and 
mark some of the letters in the input that are derived by this a£. It is possible that this 
particular was obtained as a concatenation of £ — k explicit letters to (so, not as a 
power). In such a case we are lucky, as the representation of is paid by the credit. 

So suppose that this is power, let i be the smallest such that Xi generates this power 
we assign this to Xi. Consider the and the derived substring of the input text. 
We show that if there are markings inside w^, they are all inside the last w. Consider any 
pre-existing marking within w^, say it was done when some b^ was replaced by b^. As is 
a single letter and derives it, each a derives at least one b^. The marking was done inside 
the string generated by the right-most 6^, which is generated by the right-most a, i.e. inside 
the right-most w. So all markings within are in fact within the right-most w. 

If there are no markings inside then we simply mark any i letters within w^. In the 
other case, take any marking in w, let it have length i' . If this is a unique marking in w 
then we remove it and mark arbitrary i ■ £' letters in w^, this is possible, as \w\ > Since 
log{£-£') = log£ + log£', this is enough to account for the 1 + log£ = 0(log£) representation 
cost of as well as the 0(log/) cost associated with the previous marking of length £'. 
If the marking inside w is not unique, then \w\ > £' + 2 (the 2 for the other markings). 
We remove the marking of length £', let us calculate how many unmarked letters are in 
afterwards: in w^~^ there are at least (^ — 1) • {£' + 2) letters (none of them marked) and in 
the last w there are at least £' unmarked letters (from the marking that we removed): 

(^ - 1) • {£' + 2) + i' = {££' + 2£-£' -2)+£' 
= £1' + 2£-2 
> ££' . 

We mark those ££' letters, as in the previous case, the associated \og{££') is enough to pay 
for the cost. 

Observe that there is one issue: it might be that we introduced a second Xj-in marking. 
However, we show that if there were such a marking, it was within (and so within the last 
w) and so we could choose it as the marking that was deleted when the new one was created. 
Consider the previous Xi-in marking. It was introduced for some power b^ (replaced by bk) 
that was between nonterminals in the rule for Xj. Consider the rightmost substring of the 
input text that is generated by the the explicit letters between nonterminals in the rule for 
Xj. The operation performed on G cannot shorten this substring, in fact they often expand 
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it. When bk is created, this substring is generated by 6^. When a£ is created, it is generated 
by ag, i.e. this is exactly w^. So in particular includes the marking for bk- 

This shows that there are at least 4m such markings and they are disjoint. Hence the 
calculations in ^ hold, in particular, the sum of logarithms of markings is 0{g\og{N/g)). 

□ 

Let ti and t2 be the lengths of |r| at the beginning of two consecutive phases, such that 
ti > g > t2. By Lemma [12] the cost of representing letters before the |r| was reduced to t2 
letters is 0{g + g\og{N/g)). So it is left to estimate what is the cost of representation in 
this phase. 

Lemma 13. Consider a phase, such that at its beginning T has length ti and after it it has 
length t2, where ti > g > t2- Then the cost of representing letters introduced during this 
phase is at most 0{g + g\og{N/g)). 

Proof. The cost of credit introduced during the pair compression is at most 0{g) , by LemmaO 
Consider the cost of representing blocks. Note that since the T at the end of the phase 
has t2 letters, at most 2^2 powers could be introduced (since at most two powers can be 
merged into one letter by pair compression afterwards). Let pi, . . . , pk be the lengths of 
those powers. Then the cost of representing them is proportional to 

k k 
k + ^ log Pi, where k < 2t2 and ^Pi < ti . 

i=l i=l 

Since k < 2t2 < 2g we only estimate the sum. Using the same analysis as in the case of ^ 
it can be concluded that this is at most 

2t2log (^) < 2i2log (^) < 2<7log (^) =o(glog''^ 



.2*2/ - \2t2J - ^\2gJ V \9 

with the first equality following from ti < N and the second from g >t2 and monotonicity 
of /(x) = xlog(iV/x). □ 



IS 



Theorem 2. The TtoG returns a grammar of size at most O {^g + g\og where g i 

the size of the optimal grammar for the input text. 

Proof. Consider the phase, such that before it the T had length ti and right after it t2, where 
ti > g > t2. Then by Lemma [T^ the cost of representing letters introduced before this phase 
is O (^g + (7 log (y)) while by Lemma fT^ the cost of representing letters introduced in this 

phase is at most O (^g + glog (^^^^- Hence the size of the grammar that is calculated by 

TtoGImp after this phase is at most O (^g + glog So also the minimum found during 

the computation is of at most this size. □ 
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Appendix A. Sakamoto's algorithm [IS] 

In proof that bounds the number of introduced nonterminals Theorem 2], it is first 
estimated that in one execution of the while loop for a factor /j the introduced nonterminals 
appear in /1/2 • • • fi-i, except perhaps a constant number of them. This argument follows 
from observation that fi is compressed to a/37, where |a| and I7I are bounded by a constant 
and the earlier appearance of the same string as fi is compressed to a' f3j' (where also 
\a'\ and |7'| are bounded by a constant). This is true, however, when a and 7 represent 
nonterminals introduced by repetition procedure (i.e. they are blocks in the terminology used 
here) we need to take into the account also the additional nonterminals that are introduced 
for representation of those blocks. The estimation of 0{1) is not enough, as in the worst 
case r2(logiV) are needed to represent a single block of as. We do not see any easy patch to 
repair this flaw. 

The improved analysis [W\ Theorem 2], in which the number of nonterminals is bounded 
by ^(7 + log ("^j) ) has the same shortcoming. 
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